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Abstract 

We have performed the NLO QCD global fit of BCDMS, NMC, HI and ZEUS 
data with full account of point-to-point correlations using the Bayesian approach 
to the treatment of systematic errors. Parton distributions in the proton associated 
with experimental uncertainties, including both statistical and systematic ones were 
obtained. The gluon distribution in the wide region of x was determined and it 
turned out to be softer than in the global analysis using prompt photon data. We 
also obtained the robust estimate of OsiMz) = 0.1146 ± 0.0036 (75% C.L.) based 
on Chebyshev's inequality, which is compatible with the earlier determination of 
from DIS data, but with less dependence on high twist effects. 



Preprint submitted to Elsevier Science 



1 February 2008 



1 Introduction 



Recently it lias been argued ^ that parton distributions functions (PDFs) ob- 
tained from the global data analysis (e.g. [0,0) have the principal shortcomings 
arising from the absence of experimental errors associated with the parame- 
ters of these distributions. Indeed, the only and often used way to evaluate the 
spread of predictions given by these PDFs is to compare results of calculations 
with the various parametrizations input. It is evident that if different authors 
use the same theoretical model and similar data sets this procedure cannot 
account for real uncertainties occuring due to statistical and systematic fluc- 
tuations of data used to extract PDFs. These uncertainties can be evaluated 
using the propagation of these fluctuations into the dispersion of PDFs pa- 
rameters or PDFs themselves. The conclusive treatment of systematic errors, 
which are usually dominating, is often limited since they are presented in the 
publications as the combinations from separate sources. For the recent deep 
inelastic scattering (DIS) data from HERA as well as older ones from SPS 
full error matrix are fortunately available. Deep inelastic scattering of charged 
leptons remains the cleanest source of information on PDFs among the other 
relevant processes and the careful analysis of these data including propagation 
of systematics can be valuable for exploring the nucleon structure. The han- 
dling with statistical fluctuations is well understood on the basis of probability 
theory, meanwhile the elaborating of systematic ones is the subject of various 
approaches. 

In one of them, based on the classical treatment of probability, one considers 
the systematic shifts as additional unknown methodical parameters arising 
due to a poor knowledge of experimental apparatus. Within this approach 
one usually tries to determine these parameters using some statistical esti- 
mator, say minimization, to fit the data with these parameters left free. 
The obtained values are further considered as a reasonable approximation to 
the true values and data are corrected to account for these systematic shifts. 
As to systematic errors of theoretical model parameters, they are evaluated 
inverting full error matrix, including both physical and methodical parameter 
derivatives. In most cases, the only kinds of the systematic errors which can be 
determined in the pure classical approach are the systematics connected to the 
general normalization of the data. Other methodical parameters are strongly 
correlated with each other and with physical parameters which leads to their 
huge errors and unreasonable central values. This situation can be readily ex- 
plained qualitatively: as far as one turned out to be unable to determine the 
parameters of the apparatus using the special tests and measurements it is 
doubtful that one can do it using some cross section measurements indirectly 
related to the resolving of the methodical ambiguities. 

Another, much more productive approach, is based on the Bayesian treatment 
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of systematic uncertainties. In this approach they are considered as random 
variables with the postulated/evaluated probabihty distribution function and 
systematic errors are evaluated within general statistical procedures alongside 
with the statistical errors. For the analysis of the modern DIS data as a rule 
having a number of noticeable systematic errors this approach is the unique 
possibility to account for the point-to-point correlation of data. This is the 
Bayesian approach that we use in our paper to obtain the complete propa- 
gation of systematic uncertainties of DIS data into the uncertainties of the 
resulting PDFs. 



2 Theoretical and experimental input 



2. 1 Data used in the fit 



As a subject of our analysis we use the data for deep inelastic muon/electron 
hydrogen/deuterium scattering p|-[l2| cut to reduce the effects of high twists 
in the following way 



W > A GeV, > 9 GeV^, 



where W and are common DIS variables. The number of data points for 
each experiment after the cut is presented in Table 1. For data of ZEUS 

Table 1 

The number of data points (NDP) and x^/NDP for the analysed data sets. 



Experiment 


BCDMS 


NMC 


HI 


ZEUS 


total 


NDP 


558 


190 


147 


166 


1061 


X'/NDP 


0.97 


1.43 


0.91 


2.00 


1.20 



collaboration asymmetric systematic errors were averaged. As to BCDMS data 
we suppose the total correlation of systematic errors for proton and deuterium 
cross sections. 



2.2 Probability model of the data 



If the experimental data with K sources of multiplicative systematics are ex- 
plicitly described by a theoretical model they can be presented in the Bayesian 



3 



approach as 



K 

Vi = ifi + mai) ■ (l + Y^XkVi), 

k=l 

where fi = fi{6^) is the value predicted by the theoretical model with pa- 
rameter 9^, fii and are independent random variables, ai and rif - statistic 
and systematic errors from the k-th source for i-th measurement, i = 1 ■ ■ ■ N, 
k = 1 ■ ■ ■ K, N is the total number of points in the data set. If the data 
come from the data sample with a large number of events in every bin, /j, 
are normally distributed, as to A, the only assumption we are making is that 
they have zero average and unity dispersions. Within this ansatz individual 
measurements are correlated and their correlation matrix Cij is given by 

K 

C^j = E fiVifjVj + 5ija^ 

k=l 

where 5ij is the Kronecker symbol. To obtain the estimator of the parameter 
9^ we minimize the quadratic form 

N 

X\e)= Y^[f^{9)-y,]E,,[f,{9)-y,l (1) 

where Eij is inverted correlation matrix. We should note that through this 
paper we treat the normalization errors within this formalism as well as other 
systematics are regarded as multiplicative, which is almost always the case for 
counting experiments. The minimization was made with the help of MINUIT 
package supplied with the modules improving the numerical stability of 
calculations 0]. 

If Afc are normally distributed and 7]^ <^ 1, set obeys the multidimensional 
Gaussian distribution with correlations and 9 has the minimal possible dis- 
persion. The systematic errors calculated as the propagation of uncertainties 
in apparatus parameters or Monte-Carlo corrections are well believed to be 
Gaussian distributed. At the same time we have shown that even this is not 
the case this estimator has reduced dispersion comparing with the simplest 
without account of correlations. One should underline that as far as we use 
the correct covariance matrix built using predicted averages for the measure- 
ments our estimator would be asymptotically unbiased and hence does not 
suffer from the bias discussed in 
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2.3 QCD input 



Physical model for describing the considered data is based on the parton model 
with pQCD evolution of the light quarks and gluon distributions in the proton 
defined at initial value of Qo = 9 GeV^. These distributions were evolved 



using DGLAP equations [14| in the NLO within MS factorization scheme 



TB| . As to the contributions of c-quark and 6-quark they were calculated 
using the LO formula from ||16| setting rric = 1-5 GeV, rrib = 4.5 GeV and the 



renormalization/factorization scale equal to yQ"^ + 4m^ Our QCD evolution 
program was tested as suggested in and demonstrated numerical precision 



of 0(0.1%) in the kinematic region covered by the analysed data. Adjusting 
the functional form of PDFs we've started from rather general and widely 
used expressions 

XQiix, Qo) = Ax'^^il - xf^il + 7iv^ + 7^x), (2) 



and then reduced the number of free parameters keeping the quality of data 
description. The resulting functional form of PDFs at Qo looks like 



xdvix,Qo) = xdsix,Qo) = ^x'^^'^il - xf ' 



2 Aq 

xuv{x,Qo) = — 7Fa:''" (l -x)^"(l + 7^x), xus{x,Qo) = — r/„x"^"(l - x)''^" 

xG{x,Qo) = Acx^'^il-xf'^, xss{x,Qo) = ^Vsx""- {I - x)'- . 



We did not consider , and Aq as free parameters, they were calculated 
from other parameters using partons' number/momentum conservation. As to 
Ns it is defined by the relation 

1 

2 J x[us{x, Qo) + ds{x, Qo) + Qo)]dx = As. 





Forecasting the final results we note that after trial fits it has been found 
that rju is well compatible with unity and it is fixed at this value. We fixed 
r]s = 0.5, which is compatible with recent CCFR findings |13[ and also adopted 
O'su = cisd = o-ss, bss = ipsu + bgd) 1'^ siucc our data do not allow for a separate 
determination of these parameters. 
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We calculate strong coupling constant as{Q) from the fitted parameter as{Mz) 
by numerical solving of the NLO renormalization equation 



asiQ) asiMz 



27r \Mz 



[(3 + 1/asiMz) 



where 



This approach prevents one from the uncertainties occuring for the approxi- 
mate solutions based on the expansion in the inverse powers of ln(Q), which 
are ~ 0.001 at the scale of evolution from Mz to 0{GeV) (cf. [|18|), i.e. is 
comparable with the standard deviation of a{Mz)- The number of the active 
fermions ^/ is changing from 4 to 5 due to b-quark threshold at the Q = rub 
keeping continuity of as{Q)- 



2.4 Corrections to the basic formula and data 



2.4.1 Target mass correction 



In addition to the pure pQCD evolution we applied to the calculated value of 
F2 the so-called target mass corrections [O] using the relation 



where 



2x AM^x'^ 



and M is the nucleon mass. The contribution to this correction of the order 
of M^/Q'^ presented in turned out to be negligible for all considered data. 
Target mass correction is most essential for the BCDMS data, where it ranges 
from -1% to +7%, having the average module related to the statistical error as 
large as 0.16. We should note that our way of introducing this correction differs 
from the one applied in pO[ and consisting of the substitution F2{x,Q) — ^ 
-^2(^5 Q)- Due to this difference in our case the correction exhibits crossover 
from negative to positive values at x ~ 0.5 instead of x ~ 0.4 like in and 



differs in the magnitude. For the NMC data this correction is significantly 
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q 

X 



0.3 



0.25 - 



0.15 - 




0.05 - 



Fig. 1. R = ul/ (Tt calculated using our resulting PDFs (solid line) and the band of 
RfhAC [22] (dashed hues) at = g GeV"^ . 

smaller ( range - [-1%,0%], relative average - 0.05 ) and for ZEUS and HI 
data is absolutely negligible. 



2.4-2 Reduction to the common R = Oi^j Gt 

All the data on Fi were reduced to the common value oi R = ai/ar comprised 
the NLO contribution from light quarks and gluon, the LO contribution from 
c-quark and 6-quark, and the target mass correction included (see for the 



compilation of the relevant formula). The value of R was calculated during 
the fit for every new set of the PDFs parameters (its final form is presented 
on Fig.l). This reduction is most essential at the smallest x accessible in an 
experiment, mainly, due the maximum sensitivity of the data to the value of 
R in these regions. The value of this correction is different for the considered 
data sets. For the BCDMS data the value of this correction is in the range of 
[-3.5%,0%] (the average relative module - 0.10). This collaboration calculated 
R from pQCD predictions, but used the larger gluon distributions than in our 
final set. The NMC data are renormalized by 0.10 statistical error in average 
(range - [-1.5%,2%]). For the ZEUS data, which exhibit the most sensitivity 
to the choice of R due to the large span in lepton scattering variable ?/, this 
correction calculated with the final set of our PDFs ranges from -3% to 0% 
with the average relative module of 0.04 and as to the HI data they are affected 
to the same extent. 
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Table 2 

The fitted parameters of PDFs with the full experimental errors including statistics 
and systematics. 







745 ± 024 


Sg3. 




159 ± 036 




b,, 


3.823 ± 0.070 




Oj or! 


-0.1885 ± 0.0072 




12 


0.56 ±0.28 




bsd 


7.5 ± 1.3 




ad 


0.875 ± 0.066 






1.0 ±0.12 




hd 


5.32 ±0.22 




bsu 


10.61 ±0.95 


Glue 


ac 


-0.267 ± 0.043 




Vs 


0.5 ± 1.0 




bo 


8.2 ± 1.5 






0.1146 ±0.0018 



Resuming we should note that this correction, being not very large in average, 
is significant for separate data points on the edge of the experimental accep- 
tance. Since at small x the value of R heavily depends on the G{x,Q), our 
approach imposes the additional constraints on its value. The residual influ- 
ence of different ansatzes for R used in the calculation of radiative corrections 
in different experiments is believed to be small. 



2.4-3 Fermi motion correction in deuterium 



Deuterium data were corrected for Fermi motion using procedure ||2^ with the 
Paris wave function for deuterium |2^]. This correction was also calculated 
iteratively to obtain fully consistent set of PDFs. The value of R = ai/cxT 
for deuteron was adopted to be unchanged under this correction, we have 
proved that this adoption is of minor importance for the final results. For 
the calculation of the relevant integrals we used program P3| , which exhibited 
better numerical stability than standard procedures based on the simple Gauss 
algorithm. This correction being maximum at large x ranges from -2% to 
+15% for the BCDMS data and from -2% to -1% for the NMC data, whereas 
its average relative module is about 0.6 for the both experiments. 



3 Results 



The central values and the full experimental errors of the adjustable param- 
eters obtained after the minimization of (|I|) are presented in Table 2 (full 
correlation matrix of the fitted parameters is available by the request to the 
author). To decrease the model dependence of our predictions, calculating the 
covariance matrix we released parameters r]u and rjs, keeping their central val- 
ues intact. The resulting values are presented in Table 1. On the average 
the model describes the data fairly well. One can heavily ascribe rather large 
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Fig. 2. The description of BCDMS data with our PDFs. The data and curves are 
scaled by factor 1.2^^~% where i runs from 1 for the highest x bin to 11 for the 
lowest one. 

obtained for the NMC and ZEUS data to the shortcoming of the theoretical 
model, as far as as the BCDMS and HI data having comparable statistics and 
lying in the nearby kinematic regions are described by this model perfectly. 

The most probable explanation is that some systematic errors in these exper- 
iments are not Gaussian distributed. The average bias of the data against our 
model, calculated as 



B = 



f-y 



turned out to be 0.10, i.e. is statistically insignificant. The principal difference 
of our analysis from other global fits is that we do not renormalize data and 
as far the BCDMS data are usually shifted down, our resulting F2 curves are 
slightly higher than others at large x. The data on F2 reduced to the common 
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Fig. 3. The same as in Fig. 2 for the NMC data [i runs from 1 to 12). For the 
presentation purposes we pictured combined energy data with convenient binning. 

value of R together with our curves are presented on Figs. 2-5, where the error 
bars correspond to the squared sum of statistics and systematics. The selected 
set of PDFs is presented on Figs. 6-9. The strange sea is not shown since from 
the analysed data we can obtain only a weak upper limit for this value. As we 
have mentioned above the distribution of our PDFs parameters defined mainly 
by the distribution of systematic uncertainties may differ from Gaussian and 
then for the robust error bands estimate one should better use Chebyshev's 
inequality. The bands presented on these pictures correspond to two standard 
deviations, which corresponds to the 75% robust confidence level. Although 
we do not use in our analysis prompt photon data, which is often considered as 
an unique source of gluon distribution at moderate x, through the kinematic 
region of x = [0.0001, 0.5] gluon distributions is determined rather precisely 
and better than in the earlier analysis [|l^,^]. One could achieve this due 
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Fig. 4. The description of HI data with our PDFs. The data and curves are shifted 
by 5.1 — 0.3i, where i runs from 1 for the highest x bin to 16 for the lowest one. 




(GeV^) 

Fig. 5. The description of ZEUS data with our PDFs. The data and curves are 
shifted by 4.5 — 0.3i, where i runs from 1 for the highest x bin to 14 for the lowest 
one. 
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Fig. 6. Gluon distribution obtained in our analysis. Solid lines correspond to 
q2 ^ 9 GeV'^, dashed - to = iqooO GeV'^. Dotted line gives MRS(Rl) and 
dashed-dotted - CTEQ4M predictions at = 9 GeV^. 




Fig. 7. The same as in Fig.6 for the nonstrange sea. 
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Fig. 8. The same as in Fig.6 for the valence quarks. 




Fig. 9. The same as in Fig.6 for the nonstrange sea asymmetry. 
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to the measurement of F2 at small x, which defines gluon distribution in 
this region and provides momentum constraint to determine it at larger x as 
well. As to the quark distributions they are determined much more precisely. 
We should, however, point out that the obtained PDFs and their errors are 
certainly model dependent. Say, releasing the condition Qsu = dsd = o,ss sig- 
nificantly increases the errors of sea distributions at the small x. Analogous 
effect arises if one adds more polynomial terms to the initial PDFs. The model 
dependence is inevitable in such analysis since one cannot determine the con- 
tinual functional form of a distribution having the limited set of measurements 
and without additional constraints. In our case this model dependence is more 
pronounced for the quark distributions because the considered data are well 
known to have limited potential in the discrimination of sea and valence quarks 
meanwhile the gluon distribution and as{Mz) are less model dependent. It is 
well understood as far as the latter are defined from the F2 derivatives, less 
sensitive to the variation of separate quark distributions. At small x and large 
Q one can observe shrinking the error bands of gluon distribution. This refiects 
a well known property of the DGLAP equation based on the dominance of the 
singular terms and leading to the focusing of any input gluon distribution to 
the universal form |27]. 



For the comparison we also present the parametrizations MRS(Rl) and 
CTEQ4M on these figures. This comparison is limited because of the lack 
of the error bands for their PDFs, but any way is more conclusive than the 
comparison of two curves without any error bands moreover that one can sup- 
pose the error bands for MRS and CTEQ PDFs to be smaller than ours since 
these groups use more data in the fit. We observe the statistically significant 
difference of our gluon distribution with those given by MRS and CTEQ sets 
at large x, which can be ascribed to using in these analysis the data on prompt 
photon production. The interpretation of these data has been recently recog- 
nized to suffer from the large ambiguities and the alternative analysis of 
prompt photon data with the improved theoretical treatment of these ambi- 
guities give much lower gluon distribution at moderate x |29|, compatible with 
ours. As to the discrepancies in d-quaik and, to less extent, w-quark distribu- 
tions at moderate x, the additional investigation showed that they are partially 
explained by the influence of target mass and Fermi motion corrections. One 
can note that the larger values of quark distributions in this region of x can 
help to explain the excess of the recent data on jet production from Fermilab 
collider over NLO QCD predictions in the region of = 200 — 400 GeV, 
where the basic contribution comes from quark-quark scattering (viz @,|]). 
In addition to the above, all these discrepancies can also originate due to the 
possible numerical inaccuracies in MRS's and CTEQ's QCD evolution codes 
reported recently and difference in the a.^ values. The difference in the sea 
value at x ~ 0.3 seems to be statistically insigniflcant and can disappear after 
inclusion of more data in the analysis. 
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For the value of as{Mz) the robust estimate is 
as{Mz) = 0.1146 ± 0.0036 (75% C.L.), 



compatible with but less sensitive to the higher twist contribution. This 
estimate is not essentially biased if the PDFs functional form is changed from 
(0) to our final form and hence we can conclude that these estimates are, in 
the good approximation, model independent. 



4 Conclusion 



The Bayesian treatment of systematic errors is the clear and efficient method 
in the analysis of data with numerous sources of systematic errors and in par- 
ticular data on DIS scattering. This approach allows for a straightforward and 
correct account of point-to-point correlations contrary to widely used 'simpli- 
fication' consisting of combining statistic and systematic errors in quadrature. 
The certain suspicions that the estimator using covariance matrix suffer from 
the bias proved to be irrelevant if one uses the estimator inspired by the max- 
imum likelihood function. First time the quark and gluon distributions from 
the global fit with the full account of experimental errors are obtained. These 
PDFs can be extremely useful for further phenomenological studies. Having 
estimation of PDFs' error bands one can conclusively compare the results of 
various global fits, PDFs extracted from different processes and evaluate the 
statistical significance of theoretical uncertainties in the fitted formula. At 
last, calculation of cross sections for other processes, based on PDFs are more 
meaningful if one can account for PDFs' uncertainties. 
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